Add /v1/project/batchDelete API method that deletes with SQL #4383

mikael-carneholm-2-wcar · 2024-11-14T17:54:58Z

Description

This patch adds a /v1/project/batchDelete method that makes project life-cycle management more effective since multiple projects can be deleted per request.

Addressed Issue

This fixes #3361

Additional Details

This is a modified version of #3407 that instead uses pure SQL to do the deletion. And by doing so, it's much faster:

The previous implementation was able to delete 61 projects/second (best case, with a batch size of 1000)
This implementation has been verified to delete 430 projects/second (with a batch size of 1000)

A caveat is that it uses SQL Common Table Expressions (CTEs) which are part of the ANSI standard since SQL:1999, but still not supported by all DB vendors. The H2 database used for the unit tests has experimental support for CTEs, but does not support CTE DELETE statements. For this reason the code checks the name of the JDBC driver to conditionally run certain statements if the driver is org.postgresql.Driver.

Benchmarks using a local sample DB populated with ~11000 projects imported from a production system:

batch_size      time
100             0m34.193s
200             0m27.759s
500             0m31.075s
1000            0m25.451s

NB: The effect of the batch size is limited by the checkpoint configuration (see https://www.enterprisedb.com/blog/basics-tuning-checkpoints), but in this test the container DB was running with a default configuration.

Checklist

I have read and understand the contributing guidelines
~~- [ ] This PR fixes a defect, and I have provided tests to verify that the fix is effective~~
This PR implements an enhancement, and I have provided tests to verify that it works as intended
~~- [ ] This PR introduces changes to the database model, and I have added corresponding update logic~~
~~- [ ] This PR introduces new or alters existing behavior, and I have updated the documentation accordingly~~

Signed-off-by: Mikael Carneholm <[email protected]>

nscuro · 2024-11-19T16:18:08Z

src/main/java/org/dependencytrack/resources/v1/ProjectResource.java

+    )
+    @ApiResponses(value = {
+            @ApiResponse(responseCode = "204", description = "Projects removed successfully"),
+            @ApiResponse(responseCode = "207", description = "Access is forbidden to the projects listed in the response"),


Not sold on this status code. Some surface-level research leads me to believe using this outside of WebDAV is not intended: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/207

Wouldn't it be better to atomically fail the request if deletion of one or more projects failed?

Fair point, the 207 indeed seems to be reserved for WebDAV. The reason for implementing it this way was based on the outcome of this discussion, where the suggestion by sebD was to return a 204 with a body containing the list of inaccessible projects. The 207 was my suggestion, and since I got a thumbs up I thought I would do it the same way in this PR. I see a value in letting the client know which projects that are inaccessible, so I'm leaning towards going with sebD's suggestion instead of just returning a 401. Would that be an acceptable solution?

I think sebD's original comment was geared towards the situation where we proceed with deleting accessible projects when there have been inaccessible projects in the provided list. In that case you do need to communicate the partial success somehow.

My point is that no projects should be deleted at all, if at least one of them is inaccessible. You can then still respond with 401, and you could use ProblemDetails to communicate the inaccessible projects in a machine-readable way.

We do this for tag endpoints already:

Persistence operations can throw a TagOperationFailedException, e.g.

dependency-track/src/main/java/org/dependencytrack/persistence/TagQueryManager.java

Lines 306 to 318 in 4cb2f3a

final var errorByTagName = new HashMap<String, String>();

if (tagNames.size() > candidateRows.size()) {

final Set<String> candidateRowNames = candidateRows.stream()

.map(TagDeletionCandidateRow::name)

.collect(Collectors.toSet());

for (final String tagName : tagNames) {

if (!candidateRowNames.contains(tagName)) {

errorByTagName.put(tagName, "Tag does not exist");

}

}

throw TagOperationFailedException.forDeletion(errorByTagName);

TagOperationFailedExceptionMapper maps the exception to a TagOperationFailedProblemDetails objects that is then returned by the API.

The client then sees this:

dependency-track/src/test/java/org/dependencytrack/resources/v1/TagResourceTest.java

Lines 293 to 300 in 4cb2f3a

{

"status": 400,

"title": "Tag operation failed",

"detail": "The tag(s) foo could not be deleted",

"errors": {

"foo": "Tag does not exist"

}

}

Ok, I'll look into using ProblemDetails since that will make it consistent with how failed tag deletion is handled. Thanks for the tip!

nscuro · 2024-11-19T16:21:22Z

src/main/java/org/dependencytrack/resources/v1/ProjectResource.java

+            @ApiResponse(responseCode = "401", description = "Unauthorized")
+    })
+    @PermissionRequired(Permissions.Constants.PORTFOLIO_MANAGEMENT)
+    public Response deleteProjects(List<UUID> uuids) {


Consider using @Size(min = 1, max = ...) here. The limits should then also appear in the OpenAPI spec automatically.

I think there should be some upper limit to the number of UUIDs being submitted in one go. We can always extend it later if people need it, but we can't reduce it later if larger batches end up causing problems.

That's a good idea. I have tested it with up to 1000 UUIDs, so maybe that could be a reasonable max size to start with?

Yeah that sounds reasonable.

nscuro · 2024-11-19T16:22:04Z

src/main/java/org/dependencytrack/resources/v1/ProjectResource.java

+        try (QueryManager qm = new QueryManager()) {
+            for (Iterator<UUID> it = uuids.iterator(); it.hasNext();) {
+                UUID uuid = it.next();
+                final Project project = qm.getObjectByUuid(Project.class, uuid, Project.FetchGroup.ALL.name());


What's the reason for using FetchGroup.ALL here?

Indeed, that doesn't seem to be needed since Project.accessTeams is part of the default fetch group. ProjectQueryManager.hasAccess only needs that property from the project, so I'll change to the getObjectByUuid(Class, UUID) signature instead.

nscuro · 2024-11-19T16:23:36Z

src/main/java/org/dependencytrack/persistence/ProjectQueryManager.java

+                    """);
+                executeAndCloseWithArray(sqlQuery, (Object) uuidsArray);
+
+                // The below has only been tested with Postgres, but should work on any RDBMS supporting SQL:1999


For completeness, how many RDBMSes have you tested with this? Having tested at least H2, MSSQL, and PostgreSQL would be good. We have a Docker Compose file for MSSQL here: https://github.com/DependencyTrack/dependency-track/blob/master/dev/docker-compose.mssql.yml

I've only tested it with H2 and PostgreSQL so far (H2 in unit tests, PostgreSQL in system tests) but I can try migrating my test data to MSSQL and test that as well. I'll get back to you with the results when done!

No need to do a full load test or anything. We just need assurance that the queries work as expected. Sadly the small differences in SQL syntax between the different RDBMSes turn out to be tripwires more often than not. :D

True, true, and true :) But I'll need some data in the schema to pass the condition that the UUID list is not empty, so I'll try to migrate the relatively small exported sample that I created with pg_sample.

codacy-production · 2024-11-19T16:37:06Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.01% (target: -1.00%)	✅ 86.05% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`593b4f9`)	22588	17880	79.16%
Head commit (`3450de7`)	22674 (+86)	17951 (+71)	79.17% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#4383)	86	74	86.05%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more}

Add /v1/project/batchDelete API method that deletes with SQL

2a0a7e5

Signed-off-by: Mikael Carneholm <[email protected]>

nscuro requested changes Nov 19, 2024

View reviewed changes

nscuro added the enhancement New feature or request label Nov 19, 2024

nscuro added this to the 4.13 milestone Nov 19, 2024

mikael-carneholm-2-wcar added 3 commits November 20, 2024 09:39

Merge branch 'DependencyTrack:master' into master

7035a8c

Merge branch 'DependencyTrack:master' into master

58eff60

Merge branch 'DependencyTrack:master' into master

3450de7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /v1/project/batchDelete API method that deletes with SQL #4383

Add /v1/project/batchDelete API method that deletes with SQL #4383

mikael-carneholm-2-wcar commented Nov 14, 2024

nscuro Nov 19, 2024

mikael-carneholm-2-wcar Nov 20, 2024

nscuro Nov 20, 2024

mikael-carneholm-2-wcar Nov 20, 2024

nscuro Nov 19, 2024

mikael-carneholm-2-wcar Nov 20, 2024

nscuro Nov 20, 2024

nscuro Nov 19, 2024

mikael-carneholm-2-wcar Nov 20, 2024

nscuro Nov 19, 2024

mikael-carneholm-2-wcar Nov 20, 2024 •

edited

Loading

nscuro Nov 20, 2024

mikael-carneholm-2-wcar Nov 20, 2024 •

edited

Loading

codacy-production bot commented Nov 19, 2024 •

edited

Loading

	final var errorByTagName = new HashMap<String, String>();

	if (tagNames.size() > candidateRows.size()) {
	final Set<String> candidateRowNames = candidateRows.stream()
	.map(TagDeletionCandidateRow::name)
	.collect(Collectors.toSet());
	for (final String tagName : tagNames) {
	if (!candidateRowNames.contains(tagName)) {
	errorByTagName.put(tagName, "Tag does not exist");
	}
	}

	throw TagOperationFailedException.forDeletion(errorByTagName);

	{
	"status": 400,
	"title": "Tag operation failed",
	"detail": "The tag(s) foo could not be deleted",
	"errors": {
	"foo": "Tag does not exist"
	}
	}

Add /v1/project/batchDelete API method that deletes with SQL #4383

Are you sure you want to change the base?

Add /v1/project/batchDelete API method that deletes with SQL #4383

Conversation

mikael-carneholm-2-wcar commented Nov 14, 2024

Description

Addressed Issue

Additional Details

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikael-carneholm-2-wcar Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikael-carneholm-2-wcar Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

codacy-production bot commented Nov 19, 2024 • edited Loading

Coverage summary from Codacy

mikael-carneholm-2-wcar Nov 20, 2024 •

edited

Loading

mikael-carneholm-2-wcar Nov 20, 2024 •

edited

Loading

codacy-production bot commented Nov 19, 2024 •

edited

Loading